6  Data verification with assertr

Author

Joselynn Wallace & George Dang

7 assertr

Another package that could help with making your data analysis more reproducible is assertr (https://cran.r-project.org/web/packages/assertr/assertr.pdf). It is basically a suite of functions to check that your data meets some of your assumptions before you proceed with your analysis.

library(magrittr)
library(assertr)

Let’s work with the mtcars data. The mpg column is an example of data that shouldn’t have any negative values:

head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
mtcars %>%
  dplyr::group_by(cyl) %>%
  dplyr::summarise(avg.mpg=mean(mpg))
# A tibble: 3 × 2
    cyl avg.mpg
  <dbl>   <dbl>
1     4    26.7
2     6    19.7
3     8    15.1

What if this wasn’t true? We can mess with the data a little to see what happens..

mtcars$mpg[5] 
[1] 18.7
mtcars$mpg[5] <- mtcars$mpg[5] * -1

Then we can use assertr::verify to add a step where we confirm that we don’t have any negative values first, and if we do assertr will stop the process.

mtcars %>%
  assertr::verify(mpg >= 0) %>%
  dplyr::group_by(cyl) %>%
  dplyr::summarise(avg.mpg=mean(mpg))